Comparative analysis of targeted next-generation sequencing for Plasmodium falciparum drug resistance markers

Well-defined molecular resistance markers are available for a range of antimalarial drugs, and molecular surveillance is increasingly important for monitoring antimalarial drug resistance. Different genotyping platforms are available, but these have not been compared in detail. We compared Targeted Amplicon Deep sequencing (TADs) using Ion Torrent PGM with Illumina MiSeq for the typing of antimalarial drug resistance genes. We developed and validated protocols to type the molecular resistance markers pfcrt, pfdhfr, pfdhps, pfmdr1, pfkelch, and pfcytochrome b, in Plasmodium falciparum for the Ion Torrent PGM and Illumina MiSeq sequencing platforms. With P. falciparum 3D7 and K1 as reference strains, whole blood samples (N = 20) and blood spots from Rapid Diagnostic Test (RDT) samples (N = 5) from patients with uncomplicated falciparum malaria from Ubon Ratchathani were assessed on both platforms and compared for coverage (average reads per amplicon), sequencing accuracy, variant accuracy, false positive rate, false negative rate, and alternative allele detection, with conventional Sanger sequencing as the reference method for SNP calling. Both whole blood and RDT samples could be successfully sequenced using the Ion Torrent PGM and Illumina MiSeq platforms. Coverage of reads per amplicon was higher with Illumina MiSeq (28,886 reads) than with Ion Torrent PGM (1754 reads). In laboratory generated artificial mixed infections, the two platforms could detect the minor allele down to 1% density at 500X coverage. SNPs calls from both platforms were in complete agreement with conventional Sanger sequencing. The methods can be multiplexed with up to 96 samples per run, which reduces cost by 86% compared to conventional Sanger sequencing. Both platforms, using the developed TAD protocols, provide an accurate method for molecular surveillance of drug resistance markers in P. falciparum, but Illumina MiSeq provides higher coverage than Ion Torrent PGM.

The availability of efficacious antimalarial drug is pivotal for malaria control and elimination. Artemisininbased combination therapies (ACT) are currently the first line drugs for uncomplicated falciparum malaria in all malaria endemic countries, and new antimalarial compounds are not expected within the next 5 years. Unfortunately, artemisinin resistance has emerged in Southeast Asia and more recently in Sub-Saharan Africa. In the Greater Mekong Subregion of Southeast Asia this has been compounded by ACT partner drug resistance, resulting in a dramatic loss of efficacy for several ACTs. Monitoring of the emergence and spread of antimalarial drug resistance is crucial for defining national malaria treatment guidelines. Clinical therapeutic efficacy studies are the reference method for monitoring drug resistance, but in-vitro drug efficacy testing and assessment of molecular resistance markers in the parasite gene are important additional tools 1 . The arsenal of molecular markers is constantly increasing. Reliable genetic markers are now available for resistance against chloroquine (chloroquine resistance transporter, pfcrt), antifolate drugs (dihydrofolate reductase, pfdhfr), sulfonamides (dihydropteroate synthase, pfdhps), mefloquine (multidrug drug resistance 1, pfmdr1), artemisinins (kelch propeller region, pfkelch), piperaquine (plasmepsin II/III, pfpmII/III), and atovaquone (cytochrome B, pfcytb).
Next generation sequencing (NGS) using short reads is an increasingly used method to identify and track molecular markers for antimalarial drug resistance markers [2][3][4][5][6][7][8][9][10] . Two platforms, Ion Torrent PGM platform and Illumina MiSeq platform, are widely used, and are based on different principles. Ion Torrent PGM platform uses a semiconductor method to detect the proton that is released from the nucleotide incorporated during synthesis 11 , whereas Illumina platform uses fluorescently labeled reversible-terminator nucleotides incorporated in clonal amplified DNA captured in a flow cell 12 . These NGS techniques require only a small volume of DNA template, and are suitable for multiplexing hundreds of samples including different genetic markers in a single reaction run. In addition, NGS is highly accurate, fast and likely more cost-effective compared to conventional genotyping methods 2,4,13 .
The different NGS genotyping platforms have not been compared in detail. We here report a comparative study of Targeted Amplicon Deep sequencing (TADs) on two different NGS platforms, Ion Torrent PGM and Illumina MiSeq. We developed and validated protocols for the assessment of P. falciparum drug resistance markers for the two platforms and assessed the concordance of results with a reference method, Sanger sequencing by capillary electrophoresis (Fig. 1).

Results
Development of TADs using ion torrent PGM protocol for P. falciparum drug resistance markers. Target amplicons were produced from 20 whole blood samples, 5 RDT samples, and 6 artificial mixture samples containing a mix of 3D7 (reference) and K1 strain genomic DNA. The amplicons were pooled across Firstly, two TADs protocols were developed by using reference stain K1. Secondly, two TADs protocols were validated with P. falciparum strain K1 used as a positive control. Finally, two TADs protocols and conventional sequencing were performed in P. falciparum isolates from Ubon Ratchathani province and the SNPs results were compared.  (Tables S5 and S6) and used to calculate the sequencing accuracy, variant accuracy, false positive rate, and false negative rate, which were, respectively, 99.83% (571/572), 99.59% (241/242), 0.00% (0/242), and 0.00% (0/242) for both of the NGS protocols (Fig. 2). The one position of variant discrepancy between the two NGS platforms compared to the Sanger sequencing was one SNP in the pfdhfr gene at position I164L (chromosome 4 748,577 A > T) in the UBON0029 sample. This position was called as T by Sanger sequencing, but an A/T heterozygote on both NGS platforms. However, the minor allele frequency (nucleotide T) was assessed as 50.4% (414/821 reads) with Ion Torrent and as 21.74% (6152/28,297 reads) with Illumina MiSeq (Fig. 2).  Prevalence of resistance markers in P. falciparum from Ubon. The antimalarial drug resistant genes of P. falciparum, pfkelch, pfdhfr, pfdhps, pfcrt, pfmdr, and pfcytochrome B, were assessed by Sanger sequencing and the two TADs protocols in 17 isolates (12 whole blood samples and 5 RDT samples) from 2014 and 8 isolate of whole blood samples from 2017 collected from patients in Ubon Ratchatani. All 3 sequencing methods gave identical results with no discrepancies (Tables S5 and S6).
For pfmdr1, the prevalence of double mutants with haplotype Y184F, N1042D and F1226F was 12% (2/17), and of single mutants with haplotype Y184Y, N1042D and F1226F was 88% (15/17) in 2014. In 2017, the Table 1. Detection of mixed alleles in different artificial mixture ratios of P. falciparum 3D7 and K1. The percentage of K1 allele was calculated from five SNP positions (the coverage reads more than 500X in two TADs protocols) with triplicate sets of random subreads. The two NGS protocols can detect down to 1.00% of K1 strain. The coefficient of variant (CV) was calculated. The CV of 1.00% K1 strain mixture in the Ion Torrent PGM protocol was 0.18 and 0.32 in the Illumina MiSeq.  www.nature.com/scientificreports/ prevalence of triple mutants with haplotype Y184F, N1042D, and F1226Y of pfmdr1 was 50% (4/8) and of the single mutant with haplotype Y184Y, N1042D and F1226F was 50% (4/8).
In our study, NGS protocols were successfully developed for two widely used NGS platforms. We found that the number of total reads, and therefore the total number of reads mapped to the reference genome was higher with Illumina MiSeq compared to Ion Torrent PGM. The average number of sequence reads per amplicon was also higher using Illumina MiSeq than Ion Torrent PGM. These results are consistent with previous studies 2, 3,7 and can be explained by the higher capacity of Illumina MiSeq sequencing.
Both TADs methods using our NGS protocols provided highly accurate SNP calls. The P. falciparum SNPs in the K1 strain called from two TADs protocols were in complete agreement with the reference method. One minor allele variant in a 'heterozygous' mixed sample was identified correctly by the NGS method, but not picked up by Sanger sequencing, which can only identify 'homozygous' variants 7 .
Using our TADs protocols, both NGS platform showed a high sensitivity to detect minor alleles in artificially mixed P. falciparum strains. With 500X coverage of each of the 5 SNPs positions known to differ between the two assessed strains, including C59R, S108N (pfdhfr), A581G (pfdhps), N86Y (pfmdr1), and R371I (pfcrt), both platforms could detect the alternative allele down to a relative density of only 1% or 0.1 ng/µL. This is important  www.nature.com/scientificreports/ for antimalarial drug resistance surveillance, in particular in areas with higher transmission, where multiclonal infections will be common, and drug resistant minor clones can be easily missed. A recent study reported that for P. falciparum 10,000 reads were sufficient to detect a minor allele at a ratio of 1:1000 using TADs 26 . However, detection accuracy will depend on the method, as other studies showed that a minor allele population of 9% frequency was detected with 692 mean reads 27 , and at 5% frequency with 250 mean reads 28 . Our protocol described in this study enables the detection of a minor allele at 1% frequency with 500 reads. However, a minority of the SNPs sequenced using TADs in this study had read coverage of 500 reads or more, allowing us to assess only 12 positions in this study. The capacity to detect alleles present at very low frequencies could require high sequencing coverage, leading to a trade-off between detection of low frequency alleles and cost-effective surveillance. However, this is a key strength of the NGS approach, as Sanger sequencing, while accurate with high-frequency alleles, cannot detect these low frequency alleles.
Although not the primary aim of this study, our results showed persistence in Ubon Ratchathani of the pfcrt M74I + N75E + K76T + A220S + Q271E + N326S + G353V + I356T + R371I haplotype at 58% (7/12) in 2014 and 88% (7/8) in 2017. This might be related to the continued drug pressure on the parasite population by chloroquine, which is still used for the treatment of vivax malaria. The pfcrt mutation at position 353 has been associated with piperaquine resistance, in addition to the well described marker, Plasmepsin2 amplification 29,30 . Our study also showed persistence of mutations in the propeller region of the pfkelch gene, strongly associated with artemisinin resistance pfkelch 31 and confirmed by other studies from the same region [32][33][34][35] . Since our study, the first-line drug for the treatment of uncomplicated falciparum malaria in Eastern Thailand was changed from dihydroartemisinin-piperaquine to artesunate-pyronaridine. Continued surveillance for drug resistance will remain important as changes in treatment alter the selective pressure on different resistance variants.
NGS is cheaper compared to conventional methods. In our study the total cost of consumables to sequence six drug resistance genes per sample was $62.72 for the Ion Torrent PGM platform and $60.87 for Illumina MiSeq when multiplexing 96 samples per sequencing run. This compared to a cost of $450.90 using conventional Sanger sequencing). The NGS platforms, particularly the MiSeq, allow for more scalability to further reduce costs as samples can be multiplexed at higher numbers. Moreover, using our protocol cost and time were reduced by using multiplex PCR of targeted amplicons by selecting primer pairs with the same annealing temperature and similar size, and dividing the amplicons into two multiplex pools to ensure they are not overlapping. The amplicons could be designed to give different PCR product sizes to allow for extraction of the gel bands. An alternative could be to skip the gel extraction step, thus further reducing costs and time. The Ion Torrent PGM protocol we developed used 18 targeted PCR reactions with similar annealing temperatures which could be run as two multiplex pools, whereas the Illumina MiSeq protocol used 17 targeted PCR reactions also run as two multiplex pools. Costs can be reduced further by amplifying only known drug resistance marker regions rather than complete gene sequences. It should be recognized, though, that this precludes detecting novel drug resistance mutations falling outside these regions. A targeted approach translates to 26 targeted PCR reactions for the Ion Torrent PGM platform and 16 targeted PCR reactions for llumina MiSeq.
In summary, we developed two Targeted Amplicon Deep sequencing (TADs) protocols for genotyping of P. falciparum drug resistant markers, targeting the full length of five drug resistant genes (pfmdr1, pfkelch, pfcrt, pfdhfr, and pfdhps) and pfcytochrome b (position 268 only) using the Ion Torrent PGM and Illumina MiSeq platforms. The cost per sample was slightly lower with Illumina MiSeq compared to Ion Torrent PGM, but the latter method does not require bioinformatics expertise for data analysis. There are several advantages of these platforms compared to Sanger sequencing with capillary electrophoresis, which is not suitable for high throughput analysis, is labor-intensive and expensive. We conclude that both platforms using the newly developed TADs protocols are suitable for the surveillance of P. falciparum drug resistance markers in malaria endemic areas.

Development of ion torrent PGM protocol. Primer design. 70 oligonucleotide primer pairs (see
Targeted amplicon amplification. Singleplex PCR was performed separately on all samples and control mixtures to amplify the targeted amplicons of six antimalarial drug resistance genes with Q5 High-Fidelity DNA polymerase (New England BioLabs, MA, USA). The PCR products were purified and concentrated using FavorPrep GEL/PCR Purification Mini Kit. The purified PCR products were quantified and improved the purity by Nanodrop (Thermo Fisher Scientific, USA). The concentration of purified PCR products were adjusted to 7.6 ng/ul, and then the PCR amplicons were pooled. . then, the variant data were filtered by custom criteria: variant quality score was greater than 10, the variant was supported by both forward and reverse sequence reads, the minimum frequency of variant is more than 2% and the minimum coverage of variant is more than 10 reads. Custom criteria were not applied for variant calling for the artificial mixture models.
Targeted amplicon amplification. Singleplex PCR was performed separately on all samples and control mixtures to amplify the targeted amplicon of six antimalarial drug resistance genes with Q5 High-Fidelity DNA polymerase (New England BioLabs, MA, USA). The PCR products were purified using FavorPrep GEL/PCR Purification Mini Kit and quantified by Nanodrop (Thermo Fisher Scientific, USA). The concentration of purified PCR product was adjusted to 33 nM, and then the PCR amplicons were pooled.  43 . The variant data were filtered by custom criteria which similar to Ion Torrent PGM: the variant quality score was greater than 10, the variant was supported by both forward and reverse sequence reads, the minimum frequency of variant is more than 2% and the minimum coverage of variant is more than 10 reads. SNPs were annotated using P. falciparum 3D7 as reference by IGV program (version 2.3; https:// softw are. broad insti tute. org/ softw are/ igv/) 44 . Custom criteria were not applied for variant calling for the artificial mixture models.
Alternative allele detection. The alternative allele detection of the two NGS platforms was calculated by using an artificial mixture of two strains. Two strains artificial mixture were performed by mixing 10 ng/ul parasite genomic DNA of P. falciparum strains 3D7 (MRA -102G) and K1 (MRA-159G) in different ratio of volume with the same standard genomic concentration from BEI Resources (https:// www. beire sourc es. org/ Home. aspx). The ratios used were 0:100 (3D7 0 ul: K1 100 ul), 25 , had coverage more than 500X and were used in alternative allele detection, respectively. Samtools view program (version 1.9; https:// sourc eforge. net/ proje cts/ samto ols/ files/ samto ols/1. 9/) was used to perform subsampling of reads at the mutation position with triplicate random subsamples. IGV program (version 2.3; https:// softw are. broad insti tute. org/ softw are/ igv/) was used to visualize the sequence reads.
All methods were carried out in accordance with relevant guidelines and regulations.
Cost calculation. The cost was calculated per sample, with cost including consumable plasticware, reagents from DNA extraction step to sequencing step, and primer for each sequencing platform (conventional sequencing: ABI 3031XL genetic analyser, Ion Torrent PGM, and Illumina Miseq).
Ethics approval and consent to participate. Ethical approvals for the study were obtained from the ethical review committees of the Faculty of Tropical Medicine, Mahidol University (MUTM2020-044-01). Informed consent was obtained from all participants.